answer question
The Pentagon is planning for AI companies to train on classified data, defense official says
The generative AI models used in classified environments can answer questions but don't currently learn from the data they see. The Pentagon is discussing plans to set up secure environments for generative AI companies to train military-specific versions of their models on classified data, has learned. AI models like Anthropic's Claude are already used to answer questions in classified settings; applications include analyzing targets in Iran. But allowing models to train on and learn from classified data would be a new development that presents unique security risks. It would mean sensitive intelligence like surveillance reports or battlefield assessments could become embedded into the models themselves, and it would bring AI firms into closer contact with classified data than before. Training versions of AI models on classified data is expected to make them more accurate and effective in certain tasks, according to a US defense official who spoke on background with .
- Asia > Middle East > Iran (0.25)
- North America > United States > Massachusetts (0.05)
- Information Technology (1.00)
- Government > Military (1.00)
- North America > United States > California (0.04)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
RECKONING: Reasoning through Dynamic Knowledge Encoding
Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., in-context reasoning). However, since the available knowledge is often not filtered for a particular question, in-context reasoning can be sensitive to distractor facts, additional content that is irrelevant to a question but that may be relevant for a different question (i.e., not necessarily random noise). In these situations, the model fails todistinguish the necessary knowledge to answer the question, leading to spurious reasoning and degraded performance. This reasoning failure contrasts with the model's apparent ability to distinguish its contextual knowledge from all the knowledge it has memorized during pre-training. Following this observation, we propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters before presenting it with a question. Our method, RECKONING, is a bi-level learning algorithm that teaches language models to reason by updating their parametric knowledge through back-propagation, allowing them to answer questions using the updated parameters.
FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering
Lee, Gyubok, Bach, Elea, Yang, Eric, Pollard, Tom, Johnson, Alistair, Choi, Edward, jia, Yugang, Lee, Jong Ha
The recent shift toward the Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR) standard opens a new frontier for clinical AI, demanding LLM agents to navigate complex, resource-based data models instead of conventional structured health data. However, existing benchmarks have lagged behind this transition, lacking the realism needed to evaluate recent LLMs on interoperable clinical data. To bridge this gap, we introduce FHIR-AgentBench--a benchmark that grounds 2,931 real-world clinical questions in the HL7 FHIR standard. Using this benchmark, we systematically evaluate agentic frameworks, comparing different data retrieval strategies (direct FHIR API calls vs. specialized tools), interaction patterns (single-turn vs. multi-turn), and reasoning strategies (natural language vs. code generation). Our experiments highlight the practical challenges of retrieving data from intricate FHIR resources and the difficulty of reasoning over them--both of which critically affect question answering performance.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > South Korea (0.04)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.46)
- Health & Medicine > Health Care Technology > Medical Record (0.95)
- Health & Medicine > Diagnostic Medicine (0.88)
Semantic World Models
Berg, Jacob, Zhu, Chuning, Bao, Yanda, Durugkar, Ishan, Gupta, Abhishek
Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective of predicting future pixels is often at odds with the actual planning objective; strong pixel reconstruction does not always correlate with good planning decisions. This paper posits that instead of reconstructing future frames as pixels, world models only need to predict task-relevant semantic information about the future. For such prediction the paper poses world modeling as a visual question answering problem about semantic information in future frames. This perspective allows world modeling to be approached with the same tools underlying vision language models. Thus vision language models can be trained as "semantic" world models through a supervised finetuning process on image-action-text data, enabling planning for decision-making while inheriting many of the generalization and robustness properties from the pretrained vision-language models. The paper demonstrates how such a semantic world model can be used for policy improvement on open-ended robotics tasks, leading to significant generalization improvements over typical paradigms of reconstruction-based action-conditional world modeling. Website available at https://weirdlabuw.github.io/swm.
Can Large Language Models Bridge the Gap in Environmental Knowledge?
Smail, Linda, Calonge, David Santandreu, Kamalov, Firuz, Orak, Nur H.
The investigation employs a standardized tool, the Environmental Knowledge Test (EKT - 19), supple mented by targeted questions, to evaluate the environmental knowledge of university students in comparison to the responses generated by the AI models. The results of this study suggest that while AI models possess a vast, readily accessible, and valid kno wledge base with the potential to empower both students and academic staff, a human discipline specialist in environmental sciences may still be necessary to validate the accuracy of the information provided. Keywords: En vironmental Education; AI Models; EKT - 19 1. Introduction Extreme weather events, increasing global temperatures, rising sea - levels, and changes to ecosystems and biodiversity are all consequences of climate change, which is mostly caused by anthropogenic greenhouse gas emissions ( Masson - Delmotte et al., 2018). Meanwhile, the loss of biodiversity due to habitat degradation, pollution, overexploitation, and invasive species threatens the resilience of society's ecosystems (Nature, 2021). These consequences pose questions regarding food security, public he alth, and socioeconomic stability. Thus, effective access to accurate environmental knowledge is crucial for developing sustainable solutions and informed environmental policies.
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
- South America > Brazil > Pernambuco > Recife (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Energy (1.00)
- Law > Environmental Law (0.69)
- Education > Educational Setting > Higher Education (0.48)
What is Grok and why has Elon Musk's chatbot been accused of anti-Semitism?
Elon Musk's artificial intelligence company xAI has come under fire after its chatbot Grok stirred controversy with anti-Semitic responses to questions posed by users – just weeks after Musk said he would rebuild it because he felt it was too politically correct. On Friday last week, Musk announced that xAI had made significant improvements to Grok, promising a major upgrade "within a few days". Online tech news site The Verge reported that, by Sunday evening, xAI had already added new lines to Grok's publicly posted system prompts. By Tuesday, Grok had drawn widespread backlash after generating inflammatory responses – including anti-Semitic comments. One Grok user asking the question, "which 20th-century figure would be best suited to deal with this problem (anti-white hate)", received the anti-Semitic response: "To deal with anti-white hate? Here's what we know about the Grok chatbot and the controversies it has caused. Grok, a chatbot created by xAI – the AI company Elon Musk ...
- Asia > Middle East > Republic of Türkiye (0.29)
- North America > United States (0.15)
- Europe > Poland (0.06)
- (3 more...)
- Law > Civil Rights & Constitutional Law (1.00)
- Government (1.00)
Structured Attention Matters to Multimodal LLMs in Document Understanding
Liu, Chang, Chen, Hongkai, Cai, Yujun, Wu, Hang, Ye, Qingwen, Yang, Ming-Hsuan, Wang, Yiwei
Document understanding remains a significant challenge for multimodal large language models (MLLMs). While previous research has primarily focused on locating evidence pages through precise multimodal queries, our work investigates a fundamental yet overlooked aspect: how input format influences document comprehension performance. Through systematic analysis, we discover that raw OCR text often impairs rather than improves MLLMs' performance, which is a counterintuitive finding we attribute to attention dispersion and structure loss. To further substantiate our hypothesis, we propose a novel structure-preserving approach that encodes document elements using the LaTex paradigm, maintaining the hierarchical organization and spatial relationships critical for comprehension. Our attention analysis reveals that structured text induces structured attention patterns on both textual and visual content, directing models to focus on semantically meaningful regions while reducing attention waste. This approach significantly enhances MLLMs' document question answering performance across diverse document types without requiring architectural modifications or additional training.
- Europe > Germany (0.04)
- Oceania > Australia > Queensland (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- (3 more...)